31 research outputs found

    280 Birds with One Stone: Inducing Multilingual Taxonomies from Wikipedia using Character-level Classification

    Get PDF
    We propose a simple, yet effective, approach towards inducing multilingual taxonomies from Wikipedia. Given an English taxonomy, our approach leverages the interlanguage links of Wikipedia followed by character-level classifiers to induce high-precision, high-coverage taxonomies in other languages. Through experiments, we demonstrate that our approach significantly outperforms the state-of-the-art, heuristics-heavy approaches for six languages. As a consequence of our work, we release presumably the largest and the most accurate multilingual taxonomic resource spanning over 280 languages

    Data-Driven, Personalized Usable Privacy

    Get PDF
    We live in the "inverse-privacy" world, where service providers derive insights from users' data that the users do not even know about. This has been fueled by the advancements in machine learning technologies, which allowed providers to go beyond the superficial analysis of users' transactions to the deep inspection of users' content. Users themselves have been facing several problems in coping with this widening information discrepancy. Although the interfaces of apps and websites are generally equipped with privacy indicators (e.g., permissions, policies, ...), this has not been enough to create the counter-effect. We particularly identify three of the gaps that hindered the effectiveness and usability of privacy indicators: - Scale Adaptation: The scale at which service providers are collecting data has been growing on multiple fronts. Users, on the other hand, have limited time, effort, and technological resources to cope with this scale. - Risk Communication: Although providers utilize privacy indicators to announce what and (less often) why they need particular pieces of information, they rarely relay what can be potentially inferred from this data. Without this knowledge, users are less equipped to make informed decisions when they sign in to a site or install an application. - Language Complexity: The information practices of service providers are buried in complex, long privacy policies. Generally, users do not have the time and sometimes the skills to decipher such policies, even when they are interested in knowing particular pieces of it. In this thesis, we approach usable privacy from a data perspective. Instead of static privacy interfaces that are obscure, recurring, or unreadable, we develop techniques that bridge the understanding gap between users and service providers. Towards that, we make the following contributions: - Crowdsourced, data-driven privacy decision-making: In an effort to combat the growing scale of data exposure, we consider the context of files uploaded to cloud services. We propose C3P, a framework for automatically assessing the sensitivity of files, thus enabling realtime, fine-grained policy enforcement on top of unstructured data. - Data-driven app privacy indicators: We introduce PrivySeal, which involves a new paradigm of dynamic, personalized app privacy indicators that bridge the risk under- standing gap between users and providers. Through PrivySeal's online platform, we also study the emerging problem of interdependent privacy in the context of cloud apps and provide a usable privacy indicator to mitigate it. - Automated question answering about privacy practices: We introduce PriBot, the first automated question-answering system for privacy policies, which allows users to pose their questions about the privacy practices of any company with their own language. Through a user study, we show its effectiveness at achieving high accuracy and relevance for users, thus narrowing the complexity gap in navigating privacy policies. A core aim of this thesis is paving the road for a future where privacy indicators are not bound by a specific medium or pre-scripted wording. We design and develop techniques that enable privacy to be communicated effectively in an interface that is approachable to the user. For that, we go beyond textual interfaces to enable dynamic, visual, and hands-free privacy interfaces that are fit for the variety of emerging technologies

    The Curious Case of the PDF Converter that Likes Mozart: Dissecting and Mitigating the Privacy Risk of Personal Cloud Apps

    Get PDF
    Third party apps that work on top of personal cloud services such as Google Drive and Dropbox, require access to the user's data in order to provide some functionality. Through detailed analysis of a hundred popular Google Drive apps from Google's Chrome store, we discover that the existing permission model is quite often misused: around two thirds of analyzed apps are over-privileged, i.e., they access more data than is needed for them to function. In this work, we analyze three different permission models that aim to discourage users from installing over-privileged apps. In experiments with 210 real users, we discover that the most successful permission model is our novel ensemble method that we call Far-reaching Insights. Far-reaching Insights inform the users about the data-driven insights that apps can make about them (e.g., their topics of interest, collaboration and activity patterns etc.) Thus, they seek to bridge the gap between what third parties can actually know about users and users perception of their privacy leakage. The efficacy of Far-reaching Insights in bridging this gap is demonstrated by our results, as Far-reaching Insights prove to be, on average, twice as effective as the current model in discouraging users from installing over-privileged apps. In an effort for promoting general privacy awareness, we deploy a publicly available privacy oriented app store that uses Far-reaching Insights. Based on the knowledge extracted from data of the store's users (over 115 gigabytes of Google Drive data from 1440 users with 662 installed apps), we also delineate the ecosystem for third-party cloud apps from the standpoint of developers and cloud providers. Finally, we present several general recommendations that can guide other future works in the area of privacy for the cloud

    Scalable and Secure Aggregation in Distributed Networks

    Full text link
    We consider the problem of computing an aggregation function in a \emph{secure} and \emph{scalable} way. Whereas previous distributed solutions with similar security guarantees have a communication cost of O(n3)O(n^3), we present a distributed protocol that requires only a communication complexity of O(nlog⁥3n)O(n\log^3 n), which we prove is near-optimal. Our protocol ensures perfect security against a computationally-bounded adversary, tolerates (1/2−ϔ)n(1/2-\epsilon)n malicious nodes for any constant 1/2>Ï”>01/2 > \epsilon > 0 (not depending on nn), and outputs the exact value of the aggregated function with high probability

    Data-Driven Privacy Indicators

    Get PDF
    Third party applications work on top of existing platforms that host users’ data. Although these apps access this data to provide users with specific services, they can also use it for monetization or profiling purposes. In practice, there is a significant gap between users’ privacy expectations and the actual access levels of 3rd party apps, which are often over-privileged. Due to weaknesses in the existing privacy indicators, users are generally not well-informed on what data these apps get. Even more, we are witnessing the rise of inverse privacy: 3rd parties collect data that enables them to know information about users that users do not know, cannot remember, or cannot reach. In this paper, we describe our recent experiences with the design and evaluation of Data-Driven Privacy Indicators (DDPIs), an approach attempting to reduce the aforementioned privacy gap. DDPIs are realized through analyzing user’s data by a trusted party (e.g., the app platform) and integrating the analysis results in the privacy indicator’s interface. We discuss DDPIs in the context of 3rd party apps on cloud platforms, such as Google Drive and Dropbox. Specifically, we present our recent work on Far-reaching Insights, which show users the insights that apps can infer about them (e.g., their topics of interest, collaboration and activity patterns etc.). Then we present History-based insights, a novel privacy indicator which informs the user on what data is already accessible by an app vendor, based on previous app installations by the user or her collaborators. We further discuss future ideas on new DDPIs, and we outline the challenges facing the wide-scale deployment of such indicators

    PBCOV: a property-based coverage criterion

    Get PDF
    Coverage criteria aim at satisfying test requirements and compute metrics values that quantify the adequacy of test suites at revealing defects in programs. Typically, a test requirement is a structural program element, and the coverage metric value represents the percentage of elements covered by a test suite. Empirical studies show that existing criteria might characterize a test suite as highly adequate, while it does not actually reveal some of the existing defects. In other words, existing structural coverage criteria are not always sensitive to the presence of defects. This paper presents PBCOV, a Property-Based COVerage criterion, and empirically demonstrates its effectiveness. Given a program with properties therein, static analysis techniques, such as model checking, leverage formal properties to find defects. PBCOV is a dynamic analysis technique that also leverages properties and is characterized by the following: (a) It considers the state space of first-order logic properties as the test requirements to be covered; (b) it uses logic synthesis to compute the state space; and (c) it is practical, i.e., computable, because it considers an over-approximation of the reachable state space using a cut-based abstraction.We evaluated PBCOV using programs with test suites comprising passing and failing test cases. First, we computed metrics values for PBCOV and structural coverage using the full test suites. Second, in order to quantify the sensitivity of the metrics to the absence of failing test cases, we computed the values for all considered metrics using only the passing test cases. In most cases, the structural metrics exhibited little or no decrease in their values, while PBCOV showed a considerable decrease. This suggests that PBCOV is more sensitive to the absence of failing test cases, i.e., it is more effective at characterizing test suite adequacy to detect defects, and at revealing deficiencies in test suites

    ACCOP: Adaptive Cost-Constrained and Delay- Optimized Data Allocation over Parallel Opportunistic Networks

    Get PDF
    As wireless and mobile technologies are becoming increasingly pervasive, an uninterrupted connectivity in mobile devices is becoming a necessity rather than a luxury. When dealing with challenged networking environments, this necessity becomes harder to achieve in the absence of end-to-end paths from servers to mobiles. One of the main techniques employed to such conditions is to simultaneously use parallel available networks. In this work, we tackle the problem of data allocation to parallel networks in challenged environments, targeting a minimized delay while abiding by user preset budget. We propose ACCOP, an Adaptive, Cost-Constrained, and delay-OPtimized data-to-channel allocation scheme that efficiently exploits parallel channels typically accessible from the mobile devices. Our technique replaces the traditional, inefficient, and brute-force schemes through employing Lagrange multipliers to minimize the delivery delay. Furthermore, we show how ACCOP can dynamically adjust to the changing network conditions. Through analytical and experimental tools, we demonstrate that our system achieves faster delivery and higher performance while remaining computationally inexpensive
    corecore